[WIP] [feat] miles lora megatron backend #409

yushengsu-thu · 2026-01-07T20:20:59Z

Description

[to-do] Need to refactor (move more features to megatron-bridge) and fix bugs
megatron backend
disk sync weight
Update LoRA weights via tensor
(Update LoRA weights to the SGLang rollout engine via tensor, which is faster than the previous disk sync approach)
Waiting for this sglang PR to be merged: Update LoRA Weights via Tensor sgl-project/sglang#16226
SGLang patch to PR: /python/sglang/srt/models/qwen2.py - line: 611

# Avoid substring match: skip if name already contains the fused param_name
# e.g., skip "q_proj" match when name contains "qkv_proj"
if param_name in name:
    continue

Need to fix the weight sync problem in LoRa (currently, I did not offload the megatron engine) - --offload-rollout-level kv_cache weight part is not right.

Pre-request

Docker: docker pull radixark/miles:latest (Digest: 6e467519505d)
Megatron-bridge: https://github.com/yushengsu-thu/Megatron-Bridge/tree/merged-megatron-0.16.0rc0
SGLang PR: Update LoRA Weights via Tensor sgl-project/sglang#16226

Git clone this megatron-bridge branch: https://github.com/yushengsu-thu/Megatron-Bridge/tree/merged-megatron-0.16.0rc0

cd megatron-bridge
pip install -e . --no-deps --no-build-isolation

pip install megatron-energon --no-deps
pip install multi-storage-client --no-deps

Docker

docker run --rm -it \
  --gpus all \
  -p 8264:8264 \
  --cap-add SYS_PTRACE \
  --security-opt seccomp=unconfined \
  --privileged \
  -v /.ssh/:/.ssh/ \
  -v /data:/data \
  --shm-size 128G \
  --name miles_yusheng \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  -w $PWD \
  radixark/miles:latest

Megatron-Bridge

git clone --branch merged-megatron-0.16.0rc0 --single-branch https://github.com/yushengsu-thu/Megatron-Bridge.git
cd Megatron-Bridge
pip install -e . --no-deps --no-build-isolation
pip install megatron-energon --no-deps
pip install multi-storage-client --no-deps

Testing

# Model and model Download
huggingface-cli download --repo-type dataset zhuzilin/gsm8k --local-dir /root/gsm8k
huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir /root/Qwen2.5-0.5B-Instruct

# Codebase
git clone --branch miles-lora-megatron --single-branch https://github.com/yushengsu-thu/miles.git 
cd miles
source scripts/models/qwen2.5-0.5B.sh
PYTHONPATH=/root/Megatron-LM/ python \
   tools/convert_hf_to_torch_dist.py \
   ${MODEL_ARGS[@]} \
   --hf-checkpoint /root/Qwen2.5-0.5B-Instruct \
   --save /root/Qwen2.5-0.5B-Instruct_torch_dist/

# Run script:
bash examples/reproducibility/run-qwen2.5-0.5B-gsm8k-lora.sh

Related Issues, PRs:

Lora FSDP backend PR: #377
SGLang sync from tensor: sgl-project/sglang#16226

Code Style Compliance

Performance: Minimized synchronization calls (.item(), .cpu(), .tolist()) in inference paths
Architecture: No duplicate code > 5 lines; files < 2,000 lines
Function Purity: Avoided in-place modification of input arguments (unless explicitly documented for memory optimization)
Pythonic: Lean constructors, minimal dynamic attributes, proper type hints on public APIs
Testing: Provided a test script that reviewers can copy & paste to run immediately

gemini-code-assist · 2026-01-07T20:21:16Z

Summary of Changes

Hello @yushengsu-thu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is a work-in-progress feature that integrates a Megatron backend for LoRA training within the miles framework. It aims to enhance the efficiency of LoRA weight updates by leveraging tensor-based synchronization with SGLang, moving away from disk-based methods. The changes provide foundational support for scalable LoRA fine-tuning using Megatron-LM, demonstrated through a new example script.

Highlights

Megatron LoRA Backend Integration: This pull request introduces a Megatron backend for LoRA (Low-Rank Adaptation) training, enabling the use of Megatron-LM for fine-tuning models with LoRA within the miles framework.
Tensor-based LoRA Weight Updates: The implementation supports updating LoRA weights via tensors, which is noted as a faster and more efficient method compared to previous disk synchronization approaches. This feature is dependent on an external SGLang pull request.
New Example Script for LoRA Training: A new example script (run-qwen2.5-0.5B-gsm8k-lora.sh) has been added, demonstrating how to perform LoRA training for the Qwen2.5-0.5B model on the GSM8k dataset using the new Megatron backend and SGLang.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new reproducibility script for running a LoRA fine-tuning experiment on the Qwen2.5-0.5B model with the GSM8K dataset using the Megatron backend. The script is well-structured, using bash arrays to organize command-line arguments for clarity.

My review focuses on improving the robustness and correctness of the script. I've provided two main suggestions:

Refining the initial cleanup logic to be more robust and safer, by preferring graceful shutdowns and highlighting the risk of using a broad pkill on all python processes.
Correcting the environment variable used to disable Python's output buffering from the non-standard PYTHONBUFFERED to the correct PYTHONUNBUFFERED.

These changes should make the script more reliable and adhere to better practices.

gemini-code-assist · 2026-01-07T20:22:58Z

examples/reproducibility/run-qwen2.5-0.5B-gsm8k-lora.sh

+pkill -9 sglang
+sleep 3
+ray stop --force
+pkill -9 ray
+pkill -9 python
+sleep 3
+pkill -9 ray
+pkill -9 python


The cleanup logic at the beginning of the script is quite aggressive and could be improved for safety and robustness.

Graceful Shutdown: Using pkill -9 (SIGKILL) immediately prevents processes from cleaning up properly. It's better to first try a graceful shutdown with pkill (SIGTERM).

Broad pkill: pkill -9 python is very broad and could terminate unrelated Python processes, which is risky outside of a completely isolated container.

Redundancy: The repeated pkill commands suggest the cleanup might be fragile. A single, more robust cleanup sequence is preferable.

Suggested change

pkill -9 sglang

sleep 3

ray stop --force

pkill -9 ray

pkill -9 python

sleep 3

pkill -9 ray

pkill -9 python

pkill sglang

ray stop --force

sleep 5 # Wait for processes to terminate gracefully

# Force kill any remaining processes.

# Note: `pkill -9 python` is broad and can be risky.

pkill -9 sglang

pkill -9 ray

pkill -9 python

gemini-code-assist · 2026-01-07T20:22:58Z

examples/reproducibility/run-qwen2.5-0.5B-gsm8k-lora.sh

+set -ex
+
+# will prevent ray from buffering stdout/stderr
+export PYTHONBUFFERED=16


The environment variable PYTHONBUFFERED is not standard. The correct variable to disable output buffering for Python is PYTHONUNBUFFERED. Setting it to any non-empty string (conventionally 1) will have the desired effect of making stdout/stderr unbuffered.

Suggested change

export PYTHONBUFFERED=16

export PYTHONUNBUFFERED=1

Copilot

Pull request overview

This PR adds Megatron backend support for LoRA training to the Miles project. The implementation includes disk-based weight synchronization and tensor-based weight update mechanisms for the SGLang rollout engine, representing a work-in-progress feature addition.

Key Changes:

Added Megatron backend integration for LoRA training
Implemented disk sync weight functionality
Added tensor-based LoRA weight update mechanism (pending upstream SGLang PR)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-07T20:24:52Z

examples/reproducibility/run-qwen2.5-0.5B-gsm8k-lora.sh

+   --rollout-shuffle
+   --rm-type math
+   # --num-rollout 100
+   --num-rollout 10 # onyl train 10 stesp


Spelling error: "onyl" should be "only". The comment should read "# only train 10 steps".

Suggested change

--num-rollout 10 # onyl train 10 stesp

--num-rollout 10 # only train 10 steps

Copilot · 2026-01-07T20:24:52Z

examples/reproducibility/run-qwen2.5-0.5B-gsm8k-lora.sh

+   --rollout-shuffle
+   --rm-type math
+   # --num-rollout 100
+   --num-rollout 10 # onyl train 10 stesp


Spelling error: "stesp" should be "steps". The comment should read "# only train 10 steps".

Suggested change

--num-rollout 10 # onyl train 10 stesp

--num-rollout 10 # only train 10 steps

Copilot · 2026-01-07T20:24:52Z

examples/reproducibility/run-qwen2.5-0.5B-gsm8k-lora.sh

+CKPT_ARGS=(
+   --hf-checkpoint /root/Qwen2.5-0.5B-Instruct/
+   --ref-load /root/Qwen2.5-0.5B-Instruct_torch_dist/
+   # Uncomment to save checkpoints (required for LoRA)


The comment states "Uncomment to save checkpoints (required for LoRA)" but the checkpoint saving arguments on lines 25-26 are already active (not commented out). This creates confusion about whether checkpoints are being saved. Either update the comment to reflect that checkpoints are enabled, or comment out lines 25-26 if they should be optional.

Suggested change

# Uncomment to save checkpoints (required for LoRA)

# Save checkpoints (required for LoRA). Adjust path/interval as needed.

Copilot · 2026-01-07T20:24:53Z

examples/reproducibility/run-qwen2.5-0.5B-gsm8k-lora.sh

+   --target-modules "q_proj,k_proj,v_proj,o_proj"
+   # --target-modules "q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj"
+   # --lora-sync-from-tensor           # Use tensor-based sync (more efficient)
+   # Uncomment to share base model between actor and ref (saves memory)


The comment states "Uncomment to share base model between actor and ref (saves memory)" but the --share-ref-base-model argument on line 41 is already active (not commented out). This creates confusion. Either update the comment to reflect that sharing is enabled, or comment out line 41 if it should be optional.

Suggested change

# Uncomment to share base model between actor and ref (saves memory)

# Share base model between actor and ref (saves memory)

…cient Lora should be supported

yushengsu-thu added 2 commits January 7, 2026 20:15

add example

4967316

Merge branch 'radixark:main' into miles-lora-megatron

303e1ca

Copilot AI review requested due to automatic review settings January 7, 2026 20:21

yushengsu-thu marked this pull request as draft January 7, 2026 20:21

Copilot started reviewing on behalf of yushengsu-thu January 7, 2026 20:21 View session

yushengsu-thu mentioned this pull request Jan 7, 2026

Development Roadmap - miles LoRA training support #340

Open

15 tasks

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

Copilot AI reviewed Jan 7, 2026

View reviewed changes

yushengsu-thu added 11 commits January 12, 2026 00:43

fix megatron training problem

f593e33

update

342d9f5

support training side - megatron: base + lora

4ce9857

support rollout part

4f5cf71

1.minor fix 2.change Lora to CanonicalLoRA - fix cuda problem. - effi…

e193417

…cient Lora should be supported

done - but need to fix weightupdate problem

382e9d5

need to fix weight update

a2494d3

lora megatron backend - end2end training

6594833

enable no --lora-adapter-path

ee92631

update script

4953c11

to-do: need to enable --no-offload-train and --no-offload-rollout

8da0dfc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [feat] miles lora megatron backend #409

[WIP] [feat] miles lora megatron backend #409

yushengsu-thu commented Jan 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	--num-rollout 10 # onyl train 10 stesp
	--num-rollout 10 # only train 10 steps

	# Uncomment to save checkpoints (required for LoRA)
	# Save checkpoints (required for LoRA). Adjust path/interval as needed.

	# Uncomment to share base model between actor and ref (saves memory)
	# Share base model between actor and ref (saves memory)

[WIP] [feat] miles lora megatron backend #409

Are you sure you want to change the base?

[WIP] [feat] miles lora megatron backend #409

Conversation

yushengsu-thu commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Pre-request

Testing

Related Issues, PRs:

Code Style Compliance

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yushengsu-thu commented Jan 7, 2026 •

edited

Loading